Are structural biases at protein termini a signature of vectorial folding? 
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Experimental investigations of the biosynthesis of a number of proteins have pointed out that part of the native 
structure can be acquired already during translation. We carried out a comprehensive statistical analysis of some 
average structural properties of proteins that have been put forward as possible signatures of this progressive 
buildup process. Contrary to a widespread belief, it is found that there is no major propensity of the amino 
acids to form contacts with residues that are closer to the N terminus. Moreover, it is found that the C terminus 
is significantly more compact and locally-organized than the N one. Also this bias, though, is unlikely to be 
related to vectorial effects, since it correlates with subtle differences in the primary sequence. These findings 
indicate that even if proteins aquire their structure vectorially no signature of this seems to be detectable in their 
average structural properties. 



INTRODUCTION 



Anfinsen's principle states that the folded state of an iso- 
lated protein corresponds to the global minimum of the system 
free energy at physiological temperature 1 1]. The Anfinsen's 
view is supported by the experimentally observed reversibil- 
ity of the folding process for a large class of proteins. It ap- 
pears so well-established that it provides the conceptual start- 
ing point of most theoretical studies and ab initio computa- 
tional simulations of protein folding J3, H, S |H S HI3 In- 
deed, computational approaches based on molecular dynam- 
ics or stochastic sampling rely on the notion that, indepen- 
dently of the starting unfolded conformation, the interplay of 
amino acid interactions is sufficient to drive a protein to the 
global free energy minimum. 

The validity of the Anfinsen's principle appears surpris- 
ing considering that the native structure of a protein is the 
result of a complicated mechanism which starts with the ri- 
bosomal translation, may involve the action of molecular 
chaperons and may end in post-translational modifications. 
The experimental investigation of the biosynthesis of spe- 
cific proteins has led to the formulation of the cotranslational 
hypothesisC3 El El El- 

According to it, proteins which 
fold in vivo acquire their spatial structure in the course of 
translation through specific kinetic routes in which the already 
grown peptide influences the folding of the rest of the chain. It 
should be remarked that cotranslational folding is not neces- 
sarily in contradiction with Anfinsen's principle. Indeed, vir- 
tually all the experimental and theoretical investigation of co- 
translational folding state explicitly that the same final (native) 
conformations are achieved as a result of the biosynthesis or 
of the refolding from the denatures state. Despite this obser- 
vation, several putative native structural signatures of the pro- 
gressive build-up of nascent proteins have been put forward 
over the years, ranging from the absence of knots in folded 
proteins, to the atypical proximity of the two termini. Some 



of these signatures have later been shown to be void of sta- 
tistical significance Q. To the present day, a feature that is 
still invoked in favour of the progressive quenching of nascent 
proteins into their native structure is the different structural 
organisation of the two termini. Since the N-terminal region 
is the first to exit from the ribosomal tunnel, it is expected to 
be more locally organized and compact than the C-terminal 
region which should grow over the pre-formed protein scaf- 
fold fTil fl5l flrl \vfa . This stimulating suggestion followed 
the observation that the conformation of N-terminal re gion s 
appeared to be easier to predict than the C- counterparts 11511 . 
More recently, Alexandrov analyzed a collection of protein 
conformations with the aim of detecting signatures of vecto- 
rial growth . The study concluded that in about two thirds 
of the analyzed proteins the majority of residues formed more 
contacts with amino acids that preceded rather than followed 
them in the primary sequence. This was interpreted as a clear 
signature of the progressive structural build-up propagating 
from the N terminus. According to ref. 1 16] this asymmetry 
would imply that, typically, the N-terminal part of the protein 
is more compact than the C-terminal one, since "previous" 
contacts in the N-terminal region are, by necessity, local. The 
latter suggestion was, however, not supported by the compari- 
son of the termini in terms of common and intuitive measures 
of compactness. 

In this study we re-examine these issues and several oth- 
ers related to the structural inequality of the N and C termini. 
We find that, according to several definitions of compactness, 
it is the C terminus that is more compact than the N one, in 
contradiction with the result of ref. 111611 . The different bias 
in compactness is shown to originate from a larger propensity 
of the C terminus to attain helical conformations. To clarify 
whether the observed inequality is compatible with the ther- 
modynamic hypothesis we elucidate its relationship with the 
difference in amino acidic composition and sequence of the 
two termini. The observed structural bias appears to be en- 
coded in the primary sequences, in agreement with Anfinsen's 
principle. 
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METHODS 

The structures on which the analysis is performed were ob- 
tained starting from the PDB select list of about 2000 non- 
homologous proteins in the protein data bank fTil fl^l . As 
customary |17, 20, 21], of these structures we retained only 
those comprising at least 80 amino acids, without incomplete 
or ambiguous structural information, and not containing sig- 
nal peptides. The identification of putative signal peptides 
(for either prokaryotes or eukaryotes) was carried out using 
the approach of ref. based on two different types of neu- 
ral networks. Proteins that had a probability greater than 50% 
to contain signal peptides according to both methods were re- 
moved from the set. The selection procedure resulted in the 
373 monomelic proteins and 85 multi domain proteins listed 
in Tabled 

The results discussed in the following sections are obtained 
through a statistical analysis performed on the monomeric 
proteins only, but are practically unchanged if the multi do- 
main proteins are included in the data set. 

Each protein was analyzed to detect structural biases at 
the two termini and to trace their possible origin back to the 
primary sequence. Several structural measures were used to 
characterise the average properties of terminal segments of 
increasing lengths at the N and C ends. Part of the analy- 
sis is carried out in terms of the contact matrix, A. For each 
protein, the contact matrix element, A/ m reflects the spatial 
proximity of the Zth and mth residues in the protein. Denoting 
with di m the distance of the corresponding C a atoms (taken 
as interaction centroids for the whole residues) in the native 
structures, the strength of the contact interaction is calculated 
from the sigmoidal weight: A/. m = [1 — tanh(rf/ m — R c )]/2. 
As customary, the cutoff interaction, R c , was set to 7.5 A. 

The contact matrix is used to detect the possible prefer- 
ential directions along the primary sequence of the contacts 
between amino acids. In the same spirit of Alexandrov we 
computed the average fraction of previous contacts for each 
residues, r p . In terms of the contact matrix, r p is defined 
as r p (i) = Ylj<i Y^jjti (note that nearest neigh- 
bors are included in the sum). The previous/forward character 
of each residues is then assigned according to whether r p is 
greater or smaller than 0.5. It is important to notice that the 
asymmetry of the previous/forward character both at the level 
of site and of the whole protein is not in contradiction with the 
symmetry of the contact matrix, A. 

We also considered the average number of contacts, n c (i), 
that amino acids at a given sequence distance, i, from the near- 
est terminus make with residues that are closer, along the pri- 
mary sequence, to the same terminus. The definition of n c (i) 
for the N-terminus is n c (i — 1) = (J2j<i wn il e f° r the 
C-terminus case is n c (i) = (%2j>iL-i) ^L-i,j)- In these for- 
mulae, L is the length of the protein under consideration and 
the brackets denote the average over the proteins in the data 
set; also consecutive residues are excluded from the summa- 
tion. Since the data set is built from a non-redundant set of N v 



proteins (N p = 373 and 85 respectively for the monomeric and 
multimeric ones) the statistical uncertainty on n c (i) is calcu- 
lated as <Xj / y/Np where cr? is the second moment of the num- 
ber of contacts at distance i from the N (or C) terminus. The 
statistical significance of the difference in the value of n c (i) 
observed at the N and C termini is finally calculated using the 
Students t-test. 

As further measures of compactness of the termini we also 
considered the radius of gyration, Rc{i) of the segments 
stretching up to the zth residue from the N or C termini as 
well as the fraction of local contacts. To correlate the observed 
structural inequality at the two termini with biases in the pri- 
mary sequence we also considered a number of sequence- 
based observables as a function of the distance i from the 
nearest terminus. In particular we considered 

(a) the average hydrophobicity according to the Kyte- 
Doolittle scale :Ala=1.8; Cys=2.5 ; Leu=3.8; Met=1.9; 
Glu=-3.5; Gln=-3.5; His=-3.2; Lys=-3.9; Val=4.2; 
Ile=4.5; Phe=2.8; Tyr=-1.3; Trp=-0.9; Thr=-0.7; Gly=- 
0.4; Ser=-0.8; Asp=-3.5; Asn=-3.5; Pro=-1.6; Arg=- 
4.5. 



(b) the average steric hindrance defined as the total number 
of heavy atoms (not hydrogens) in the side chain of an 
amino acid. 



(c) the average helical content assigned according to the 
DSSP algorithm. The helical character of a residue is 
set equal to 1 if it is classified as H (alpha helix) G (3/10 
helix) or I (pi helix), and otherwise. 

Besides these observables we have also considered the heli- 
cal propensities predicted by the GOR-IV algorithm described 
in ref. [20]. In this method the probability of an amino acid 
to belong to an alpha helix is estimated from its primary- 
sequence neighborhood, through a set of coefficients express- 
ing the conditional probability that a given pair of amino acids 
at a fixed sequence separation belongs to a secondary structure 
motif. These coefficients are learned on the set of 373 selected 
proteins from which we removed all the residues at a sequence 
separation smaller than 30 from each of the two termini. By 
doing so we ensure the statistical reliability of the GOR-IV 
results for the proteins' termini, since none of the structural 
motifs to be predicted is included in the training set. The orig- 
inal source code of the GOR IV program was compiled setting 
the Nterm and Cterm parameters equal to zero, so to allow 
secondary structure predictions also for residues very close to 
the terminus (otherwise set to "coil" by default). Suitable nor- 
malisation factors of the knowledge-based weights were also 
introduced to account for the fact that the averaging window 
can span less than the default number of 17 residues if the site 
is at a sequence distance smaller than 9 from either termini. 
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RESULTS AND DISCUSSION 

In Ref. flrjll it is suggested that amino acids have a higher 
propensity to form contacts with residues that are closer to the 
N terminus. This was interpreted as a signature of the pro- 
gressive structural buildup propagating from the N terminus. 




FIG. 1: Backbone of protein lttg. (a): Previous and forward con- 
tacts for the two residues, a and /3, at sequence separation 6 from 
the N and C-termini (highlighted in red). The spheres denote the 
region of space within an interaction cutoff distance of 7.5 A from 
the two reference residues. Amino acids within this cutoff distance 
and with a smaller [longer] sequence separation than the reference 
residue from the nearest terminus are highlighted in blue [green], 
(b): Color-coded profile for the fraction of "previous" contacts, r p , 
for each site of lttg. Accordingly, for the reference sites of the top 
panel, we have r p (a) « 3/8 and r p ((3) ~ 6/8. 

The previous/forward bias originally observed by Alexan- 
drov 11611 is confirmed by the analysis of our data set, though 
with notable changes in perspective and conclusions. In order 
to quantitatively characterise the bias, we here compute the 
average fraction of previous contacts for each residue, r p , see 
Fig. [Q). It is found that r p is rather independent on the length 
of the proteins in the data set and is practically unaffected by 
the omission of residues at the protein's termini. In terms of 
the sequence separation, the autocorrelation length in the val- 
ues of r p is about 4. The average value of r p in our set is 
0.504 ± 0.002 where the statistical error on the mean was cal- 
culated from the dispersion of the sample and accounting for 
the sequence-separation correlation. If one assigns the previ- 
ous [forward] identity to individual amino acids based on the 



fact that r p is greater [smaller] than 0.5, one finds that, of the 
nearly 40,000 residues, 50.6 % of them are of type previous. 
The asymmetry in the directional preference of contact forma- 
tion therefore appears to be minimal. This tiny asymmetry is 
amplified by the procedure of ref. 11611 where a hierarchy of 
majority rules was used to assign a previous/forward charac- 
ter first to residues and then to proteins. In fact, the site-wise 
assignment of the previous/forward character can be used to 
define the character of "blocks" of consecutive residues ac- 
cording to the majority rule. One therefore finds that, for 
(non-overlapping) blocks of size 5, 11 and 17 residues, the 
fraction of previous-type blocks is 51.6, 52.3 and 54.1%, re- 
spectively. It is therefore clear that the majority rule amplifies 
the slight site-wise asymmetry in a manner that is dependent 
on the block-size. Consequently, the heterogeneity of pro- 
teins lengths in a data set make problematic the proper notion 
of the average previous/forward character of proteins. Even 
in this case, however, the procedure ref. 11611 applied to our 
data set yields a fraction of proteins of type "previous" equal 
to 59%, a number substantially lower than the 75% observed 
in ref |la|- In summary, the previous/forward asymmetry, 
though statistically well-founded, appears to be much smaller 
than originally stated. This bias, previously regarded as a sig- 
nal of the N-terminal initiation and propagation of the folding 
process, may possibly reflects the genuine chemical inequal- 
ityof peptide chains under inversion of the primary sequence 



STRUCTURAL DIFFERENCES BETWEEN THE TERMINI 

Another possible signature of the progressive build-up of 
the proteins is that, since the N-terminal region is the first to 
exit from the ribosome, it is expected to be organized differ- 
ently than the C-terminal region which should grow over the 
pre-formed protein scaffold lll2ill5lll6lfl7ll . To elucidate this, 
we carried out a detailed analysis of structural differences be- 
tween proteins' termini. 

We first consider the average number of contacts, n c (i), 
that amino acids at a given sequence distance, i, from the 
nearest terminus make with residues that are closer, along 
the primary sequence, to the same terminus. The widespread 
notion that the N-terminus is more compact than the C 

one fain mm, and the tiny previous/forward bias we 

observed, would imply that n c (i) should be higher for the N 
region. As visible in Fig.|2^, however, the observed bias con- 
tradicts this expectation. In a region that extends up to 10 
residues away from the termini it is the C region that appears 
to be richer in internal contacts by an amount that has a high 
statistical significance. The difference is still larger than the 
error bar at a distance of 20 from the termini. The conclusions 
are robust against changes of the interaction cutoff, R c , in the 
viable range of 6-8 A and upon the use of a step function in- 
stead of a sigmoidal one for weighting the interactions. It is 
important to remark that n c (i) reflects a propensity to form 
contacts within the terminal regions, i.e. disregarding interac- 
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FIG. 2: Average values of structural observables as a function of 
the distance i of the residue from the nearest terminus. Averages 
are taken over the 373 structures listed in Table I. The error bars are 
the standard deviation on the average. The thick lines correspond 
to regions where the statistical significance (Students t-test) of the 
N-C pointwise difference is larger than 99 %. (a): Average number 
of contacts that amino acids at a given sequence distance from the 
nearest terminal make with residues that are closer to the same ter- 
minal, (b): Average difference (in A) between the gyration radius of 
segments of increasing length at the N and C termini, (c): Fraction 
of non-local contacts, i.e. interactions with residues at a sequence 
separation larger than 6. 

tions with residues with sequence distance greater than i from 
the reference terminus. In fact, if one considers the contacts 
made with any residue irrespective of the sequence separation 
from the terminus, then no statistical difference between the 
two termini emerge. 

To clarify the structural basis for the bias shown in Fig.|2k 
we monitored the average gyration radius of the terminal re- 
gions. This quantity is a direct measure of the difference in 
compactness. The results, shown in Fig.^J), demonstrate that 
the C- terminal region has a smaller average radius. For in- 
stance, the average gyration radius of the first and the last 
15 amino acids of a protein are 9.1 and 8.7 A respectively. 
The difference has a statistical relevance higher than 99% up 
to i = 16. Finally, we analyzed the average propensity of 
the amino acids to form non-local contacts, i.e. contacts with 
residues at a sequence separation larger than 6 ||8|] . Also in 
this case there is a statistically-significant difference reveal- 
ing a greater propensity of the C-terminal region to form local 
contacts than the N-terminal counterpart, as visible in Fig.|2j;. 
These results unambiguously show that, on average, the C- 
terminus is more compact than the N-one. 

Since a low contact order in proteins is an indicator of 
high helical content 1 24] we have analyzed the secondary con- 
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FIG. 3: Average values of sequence-based observables as a function 
of the distance, i, of the residue from the nearest terminus. Line 
colors, thickness and error bars follow the same convention of Figure 
|5| (a): Average hydrophobicity according to the Kyte-Doolittle scale 
(b): Average steric hindrance, (c): Average structural helical content 
identified with the DSSP algorithm, (d): Average helical propensity 
as predicted by the GOR-IV algorithm. 



tent in the proteins' terminal regions by means of the DSSP 
algorithm|25|. The results, shown in Fig. [3J;, highlight the 
higher probability of the C-terminal region to attain helical 
conformations, up to at least i = 15, consistently with other 
structural studies 1121112611 . As apparent in Fig.|5J;, the helical 
content has a maximum at separation i w 15 for the C ter- 
minus and at separation i m 25 for the N one. As a result, 
for 20 < i < 30, it is instead more common to observe he- 
lices in the N-terminal regions. This secondary-structure bias 
is responsible not only for the observed difference of contact 
order but also for the higher number of contacts formed within 
the C-terminal region rather than the N one: since the average 
overall number of total contacts is the same in the two regions, 
a higher helical content implies a higher number of local con- 
tacts. 

The observed differences between the two termini are 
highly statistically significant, and their relationship to vecto- 
rial growth must be addressed, since the presence of structural 
biases could be an evidence in favor of conformational top- 
ping resulting from an out-of-equilibrium build up 1 13 ll7T . 
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However, an average difference in the structure of the termini 
is not necessarily in contradiction with Anfinsen's hypothesis. 
In fact, the structural bias may reflect a systematic difference 
in the primary sequence at the termini. Such differences in 
sequence composition have already been reported in the lit- 
erature 1 27] but have been found to be statistically significant 
only for residues very close to the termini, and it would be 
surprising if these small differences would result in the bias 
we observed. 

In order to quantitatively address this issue, we considered 
several sequence-related properties, looking for an average 
difference between the N and the C regions. Remarkably, the 
two termini could not be distinguished by any point-wise (i.e. 
single-amino acid) property we considered. An example is 
provided in Fig.|5t, where we plot the average hydrophobicity 
as a function of the distance from the termini: no statistically 
significant difference is observed between the two regions. 
We also considered the average helical propensity, estimated 
through the knowledge-based Chou-Fasman|28| parameters, 
and the average steric hindrance, defined as the number of 
side chain heavy atoms (Fig.|3J)). Also for these indicators we 
did not observe any statistically significant difference of the 
two termini. We have also tested if the two termini are distin- 
guished by the effective pairwise interactions i2^.l30ll among 
amino acids. Considering residues up to a given separation 
from the each terminus we have calculated the energy result- 
ing from the interaction of all pairs of residues in the segment. 
By doing so we ascertain if, within the limitations of the en- 
ergy scoring function, the two termini have different average 
propensities for self-interaction, and hence compactness or lo- 
cality. Also in this case no statistically-significant difference 
was found. 

These results do not necessarily imply that the termini 
structural inequality is not encoded by the primary sequence, 
since it may only reflect the limitations of point-wise and 
mean-field indicators. To improve the analysis we resorted 
to the powerful GOR-IV scheme 1 20] for predicting proteins' 
secondary structure from the mere knowledge of their se- 
quence. The information theory approach of ref. lEoll was 
chosen because it has a good prediction performance and yet 
does not rely on structural alignment, which could bias the 
prediction at the termini. 

The results are summarized in Fig. [3jl and reveal that the 
GOR-IV approach is able to predict the correct bias on the 
termini helical propensity, at least in the i < 15 region. The 
same conclusions hold using a jackknife scheme were the pre- 
diction on each protein is done with the parameters learnt on 
all other proteins in the set. The N- and C-terminal difference 
in the average helical content predicted by the GOR IV algo- 
rithm is of the same order of the structurally-observed one. 
While the average behaviour is thus captured, on individual 
proteins the typical fraction of residues in helical conforma- 
tions that are correctly-predicted is about p\ = 0.65, while 
the fraction of non-helical residues that are mistakenly pre- 
dicted as helical is p2 =0.17. These number may be taken as 
indicators of the average reliability of the predictions. Over 



our finite sample we saw that pi and p2 are equal to 0.62 and 
0.16 respectively over residues at distance 5-17 from the N- 
terminus and 0.66 and 0.22 for the C-terminal ones. Though 
these fluctuations may simply reflect the finiteness of the sam- 
ple, it is instructive to consider them as genuine differences in 
the performance of the GOR IV scheme at the termini. Even 
in this case the putative difference in performance would be 
responsible for less than half the difference in predicted heli- 
cal propensity. According to the Students t-test, the remaining 
pointwise difference would have a probability of less than 7% 
to be generated by chance. We observe that the region of high 
statistical significance in Fig. QJl, spans many more residues 
that those over which GOR predictions appear to be correlated 
(4.5 residues). 

This shows that the difference in helical content and, there- 
fore, the difference in compactness between the termini, is 
indeed encoded in the primary sequence, though it cannot be 
picked up by intuitive point- wise indicators 1 27] . 



CONCLUSIONS 

In order to elucidate the possible role of out-of-equilibrium 
effects in determining the native structures of proteins, we an- 
alyzed the structural differences between the two termini. We 
have found that the C terminus has a higher helical content 
(Fig- Eh), a smaller gyration radius (Fig.^), and contains a 
larger number of local contacts (Fig.|2^ and Fig.|2j;) than the N 
terminus. These results contradict previous observations that, 
based on the intuitive image of a progressive protein build up, 
argued for the higher compactness of the N terminus. The use 
of a sequence-based secondary structure prediction method re- 
vealed that the observed structural asymmetry of the termini 
is encoded in subtle difference of the primary sequence at the 
protein ends. This is consistent with the Anfinsen's hypothesis 
while it rules out the necessity to invoke out-of-equilibrium 
effects to account for the terminal structural inequality. Of 
course, the possibility that naturally-selected proteins have 
evolved so to exploit kinetic biases to reach the global free 
energy minimum cannot be ruled out, as already envisaged 
by Levinthal I3l l3lll32[ 13311 . The results presented here pose 
the question of the biophysical rationale behind the sequence 
and structural inequality of the two termini. Though this issue 
is beyond the scope of the present analysis, it is tempting to 
speculate that the presence of this average terminal difference 
across a large set of unrelated proteins may be the result of 
evolutionary pressure, e.g. for folding cooperativity 13411 . 
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Ijaj 


lib8 


116u 


ln9j 


117y 


lf53 


ln9d 


lwtu 


lwkt 


llab 


ln91 


If 40 


lgnc 


lfl6 


lj5i 


lj57 


lclh 


lckv 


llmz 


lj3g 


lttg 


lj2o 


ln3k 


ln3j 


lcj5 


ltnn 


ltiu 


lbsh 


lqkl 


lbuy 


lsxl 


lbvh 


lbw3 


lm4o 


lm4p 


leol 


lemw 


liqo 


lh6q 


lej5 


leio 


lirl 


li4v 


lcdb 


lrip 


lmwb 


lmm4 


lqr5 


lddb 


ldf3 


ljr5 


ljrm 


IgVo 


ljt8 


ljw3 


lg4g 


ijyt 


lnzt 


IkOh 


lnwb 


Ijjj 


lhqi 


lorm 


lfzt 


lab3 


lgh8 


lnr3 


ladn 


lk8h 


lghh 


lag4 


ljfw 


ldlr 


lpba 


lap7 


laps 


ijdq 


lily 


lfo5 


lfjc 


lkot 


lcur 


lb6f 


lhce 


lfbr 


lb9r 


lyub 


lo3s 


lfez 


lf5t 


lexj 


lqle 


lfql 


lb26 


lmaw 


lrpt 


lqpv 


IgtP 


2avi 


ljum 


limO 


21dx 


lwbc 


119a 


loft 


2nmt 


1681 


lqso 


lb91 


7mht 


lj5s 


ljfm 


lodg 


lnlx 


lcid 


lmok 


3pva 


ljik 


4ald 


lysc 


lki9 


ljOc 


lulb 


lby3 


li9b 


lfvp 


IciO 


libl 


lexc 


lef9 


ljr4 


ldk4 


liw7 


ld9u 


llwh 


IgOt 


lhcn 


lefp 


li4w 


lltb 


ldir 


lkdq 


lg5z 


lcc5 


lqhh 


lixy 


loo5 


lh21 


lfb3 


2thi 


lhup 


lhm8 


li4n 


lith 


lgla 


llpb 


lib5 


lkfq 


lmow 


lb35 


lpdy 


lh3q 


leoi 


lkho 


lcvm 


2sas 


lgsq 


2pfk 


IkOk 


Iggl 


Ijgs 


lblO 


lk3r 


lej3 


lg71 


ln5d 


lee6 


litq 


8prn 


lkgt 


lmlb 


lgan 


la65 


115p 


lufh 


leje 


li8n 


lfvz 


2mjp 


ljlx 


liof 


lkte 


ijyb 


ljzk 


lji3 


le5f 


lko9 


lk3b 


lkvs 


lash 


lid2 


li4z 


lg64 


lotg 


leom 


liOi 


lhzi 


leum 


ldm9 


loa9 


lel6 


lbys 


lhbk 


lcoz 


lipb 


lit6 


ltl2 


2ubp 


lepm 


lk04 


lqhd 


lig3 


ljhs 


lmr8 


lc2a 


lqmy 


lb93 


lb8a 


logh 


lgak 


ljmv 


ljku 


lew2 


lg8e 


lfpo 


ldqe 


2spc 


lhdk 


lmug 


lmml 


lcmb 


ijq3 


ijyh 


Irpj 


ljh6 


lld8 


lela 


lhxn 


lc44 


lpgs 


liab 


lmqv 


ldkO 


lcv8 


lkpt 


lgwy 


lgxy 


lqst 


lmk4 


ldqi 


2bop 


lkzq 


lhtw 


lnep 


lees 


lthx 


lmol 


lako 


112q 


ld3v 


lcqm lcxy 


liby 


lcnv 


ldj7 


lhqk 


lit2 


lgvp 


Ijhj 


lfi2 


lb5e 


lfs7 


luaq 


3pvi 


llyc 


lna3 


liv3 


lcip 


liOr 


lplm 


lg2q 


llmi 


lbx4 


lwhi 


lf71 


117a 


ln7o 


lopd 


lezm 


ldqz 


llo7 


ldjO 


lnf9 


lbrt 


lbqc 


2sns 


8abp 


ldzk 


luca 


llc5 


Ijig 


lqre 


lidp 


lis3 


laba 


le6u 


lfp2 


les5 


3vub 


lew4 


leca 


4eug 


ljl7 


liwO 


loaf 


lezg 


lllf 


lh2w 


ln8v 


21is 


IjH 


les9 


lo8x 


ldbf 


lflm 


llq9 


lkal 


lqdd 


lqks 


lqau 


lj96 


lird 


le29 


lobo 


Ijgl 


lo08 


lml5 


lgu2 


lh97 


lmln 


li8o 


lkng 


ln8k 


ljf8 


lk7c 


lkt6 


lohO 


lqlw 


lf86 


lqlO 


lqj4 


lc5e 


ln62 


ljcl 


lm2d 


lpsr 


lkqp 


lmfm 


lc7k 


lga6 


lug6 


lk4i 


liqz 
















ltic 


iqrj 


lffk 


lffk 


lffk 


lffk 


lffk 


lffk 


lffk 


lm57 


2atc 


lfl7 


lqle 


His 


lf51 


ln32 


ln32 


ln32 


ln32 


ln32 


lqb3 


lgph 


lhm7 


lnbq 


lprt 


lprt 


lfvv 


lnsk 


li50 


lk83 


li50 


lis7 


lgvm 


lqax 


lkhr 


lmae 


lkf6 


lkf6 


lkf6 


1178 


liw7 


llhr 


lbvp 


lh31 


lf5q 


lpdn 


lbmq 


lqhh 


lqhh 


lnbw 


ljrk 


ljj2 


ljtd 


lmbx 


ljc5 


lepb 


lcz3 


lh4m 


lk8k 


lcew 


lkx5 


igyh 


lhke 


ln71 


limb 


ljiw 


llj9 


lo26 


igy7 


igpq 


4ubp 


llk9 


lgdO 


lo9r 


lfmO 


lo7n 


lgk8 


ljoO 


liOd 


lhyo 


lmqk lmqk 


lqft 


2tps 


lmln 













TABLE I: PDB codes of the 373 monomeric proteins (top) and the 85 
multimeric proteins (bottom) used for the sequence/structure analy- 
sis. 



